DataOrbit 2025 Project

Author

Team PACAN

Loading in and Exploring Data

Here we Load in and Explore our Data

Code
SuicideRates <- read_csv("./archive/suicide_rates_1990-2022.csv", show_col_types = FALSE) %>% clean_names()

SuicideRates %>% colnames()
 [1] "region_code"                     "region_name"                    
 [3] "country_code"                    "country_name"                   
 [5] "year"                            "sex"                            
 [7] "age_group"                       "generation"                     
 [9] "suicide_count"                   "cause_specific_death_percentage"
[11] "death_rate_per100k"              "population"                     
[13] "gdp"                             "gdp_per_capita"                 
[15] "gross_national_income"           "gni_per_capita"                 
[17] "inflation_rate"                  "employment_population_ratio"    

Here we Load in and Explore our Data

Code
sum(is.na(SuicideRates))
[1] 82117

Check the original N/A observations

Code
# Remove these columns
SuicideRates <- SuicideRates %>%
select(-region_code, -country_code, -gdp, -gross_national_income)

# These variables were redundant because we kept per capital variables.

Remove the columns that we deem as redundant, cut out gdp and gross_national_income in favor of the per capita versions.

Code
# keep death counts less than 0, cause_specific_death_percentage > 0
SuicideRates <- SuicideRates %>% 
  filter(suicide_count > 0, cause_specific_death_percentage > 0, death_rate_per100k > 0, population > 0, gni_per_capita > 0, inflation_rate > 0, employment_population_ratio > 0)

## Hone in on the number of suicides, so we omitted suicide counts that were less than 0. 

To get rid of the N/A values we remove values of each of these variables that are less than 0, since we really want to hone in on the number of suicides we omit the suicide counts that are less than 0. This way we ensure that all numerical variables are positive.

Code
# renaming variables
SuicideRates <- SuicideRates %>% rename(
  region = region_name,
  country = country_name,
  year = year,
  sex = sex,
  age = age_group,
  gen = generation,
  suicides = suicide_count,
  cause_death_pct = cause_specific_death_percentage,
  death_rate = death_rate_per100k,
  pop = population,
  gdp_pc = gdp_per_capita,
  gni_pc = gni_per_capita,
  inflation = inflation_rate,
  emp_ratio = employment_population_ratio
)

Here we rename the variables that are too long to make it easier to display in EDA, and for convenience.

Code
SuicideRates <- SuicideRates %>%
  group_by(region, country, year, sex, age, gen, pop, gdp_pc, gni_pc, inflation, emp_ratio) %>%
  summarize(
    suicides = sum(suicides, na.rm = TRUE),
    cause_death_pct = mean(cause_death_pct, na.rm = TRUE),
    death_rate = mean(death_rate, na.rm = TRUE),
    .groups = "drop"
  )

Group data by region, country, year, sex, age, generation, and economic factors. Then we summarize the suicide counts (total) and the other variables we take the mean or (average). .groups = “drop” prevents unnecessary grouping.

Code
vis_miss(SuicideRates)

Uses the naniar package to create a visual representation of missing values.

Code
## 23326 Obs 14 Col
dim(SuicideRates)
[1] 23326    14

Looks at the dimension of our dataset, we see that we have 23,326 observations and 14 columns.

Data Splitting for Model Evaluation

Code
SuicideRates <- SuicideRates %>% 
  mutate(across(where(is.character), as.factor))

set.seed(123)
data_split <- initial_split(SuicideRates, strata = "suicides", prop = 0.75)
rates_train <- training(data_split)
rates_test <- testing(data_split)
rates_fold <- vfold_cv(rates_train, v = 5)

Data splitting for modeling, this converts categorical variables to factors and splits the data into training (75%) and testing (25%). 5-fold cross-validation is used for model evaluation.

Exploratory Data Analysis (EDA)

Code
library(corrplot)
library(dplyr)

# Ensure the dataset has numeric columns selected correctly
numeric_data <- SuicideRates %>% select(where(is.numeric))

# Compute correlation matrix
cor_matrix <- cor(numeric_data, use = "pairwise.complete.obs")

# Plot correlation
corrplot(cor_matrix, method = "number", type = "lower", diag = FALSE)

This selects only numeric variables from the dataset and calculates the correlation matrix and visualizes this correlation. We see a moderate to low correlation between most of our variables and a high correlation between gdp_pc and gni_pc which makes sense because they are both economic data.

Code
# Aggregate deaths by generation category
total_deaths <- SuicideRates %>%
  group_by(gen) %>%
  summarize(sum_sui = sum(suicides))

# Reorder 'gen' by sum_sui in descending order
total_deaths$gen <- reorder(total_deaths$gen, total_deaths$sum_sui, decreasing = TRUE)

# Create a bar plot of total deaths per generation without legend and colors
ggplot(total_deaths, aes(x = gen, y = sum_sui)) +
  geom_bar(stat = "identity", fill = "steelblue") +  # Set fill to a neutral color
  scale_y_continuous(labels = scales::comma) +  # Add commas to y-axis labels
  labs(title = "Gen X and Baby Boomers Are The Most Likely To Commit Suicide", x = "Generation", y = "Total Deaths") +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "none"  # Remove the legend
  )

From this graphic, we see that the top two generations with the highest suicide count out of all of the generations are Generation X and Baby Boomers. This means that our parents and grandparents are a part of this statistic, unfortunately. On the contrary, the generation with the lowest suicide rate are the Silent Generation and Gen Alpha. This makes sense because they’re still a relatively young.
Code
# Aggregate deaths by generation category
total_deaths <- SuicideRates %>%
  group_by(age) %>%
  summarize(sum_sui = sum(suicides))

# Reorder 'gen' by sum_sui in descending order
total_deaths$age <- reorder(total_deaths$age, total_deaths$sum_sui, decreasing = TRUE)

# Create a bar plot of total deaths per generation without legend and colors
ggplot(total_deaths, aes(x = age, y = sum_sui)) +
  geom_bar(stat = "identity", fill = "gray") +  # Set fill to a neutral color
  scale_y_continuous(labels = scales::comma) +  # Add commas to y-axis labels
  labs(title = "Gen X and Baby Boomers Are The Most Likely To Commit Suicide", x = "Generation", y = "Total Deaths") +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "none"  # Remove the legend
  )

From this bar graph we see that Gen X or people that are aged 35-54 and Baby Boomers those aged 55-74 are the most likely to commit suicide.

Code
ggplot(SuicideRates, aes(x = inflation, y = suicides)) + geom_point() + labs() + labs(title = "Relationship Between Inflation and Number of Suicides", x = "Inflation Rate (percentage)", y = "Number of Suicides") + theme_minimal()

Code
# SuicideRates %>% filter(inflation == max(SuicideRates$inflation))

# the outlier values are from Ukraine in 1993

This is a scatter plot of inflation vs. suicide counts. It identifies periods in which there is a high inflation rate. In the highest inflation rate we note that there is a point in Ukraine in 1993 that had hyper-inflation caused by something that was not war.

Code
total_deaths <- SuicideRates %>%
  group_by(sex) %>%
  summarize(sum_sui = sum(suicides))

# table(SuicideRates$sex)

# print(paste0("percent female ", 28296/60056))
# print(paste0("percent male ", 31760/60056))

ggplot(total_deaths, aes(x = sex, y = sum_sui)) +
  geom_bar(stat = "identity", fill = "steelblue") +  # Set fill to a neutral color
  scale_y_continuous(labels = scales::comma) +  # Add commas to y-axis labels
  labs(title = "Men Commit More Suicide Than Women", x = "Sex", y = "Total Suicide Counts") +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "none")  # Remove the legend

The proportion of male to female in this data set are 0.52 and 0.47. There are slightly more men in the data set. However, the tiny bit of imbalance is not enough to explain the massive disparity between male and female suicde counts. Men have a much higher suicide count than women over all generations. The actual percentage for men is 77%, while women is 0.2233983.
Code
# Aggregate deaths by country
total_deaths <- SuicideRates %>%
  group_by(country) %>%
  summarize(sum_sui = sum(suicides))

# Find top 10 countries by total suicides
top_10_deaths <- total_deaths %>%
  arrange(desc(sum_sui)) %>%  # Order by total suicides in descending order
  head(10)  # Select top 10 rows

# Reorder 'gen' by sum_sui in descending order
top_10_deaths$country <- reorder(top_10_deaths$country, top_10_deaths$sum_sui, decreasing = TRUE)

# Create a bar plot of total deaths per generation without legend and colors
ggplot(top_10_deaths, aes(x = country, y = sum_sui)) +
  geom_bar(stat = "identity", fill = "steelblue") +  # Set fill to a neutral color
  scale_y_continuous(labels = scales::comma) +  # Add commas to y-axis labels
  labs(title = "Top 10 Countries with The Highest Suicide Counts", x = "Country", y = "Total Deaths") +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "none")  # Remove the legend

The top two countries with the highest suicde counts are Russian Federation or Russia and the United States.
Code
# Create a bar plot of total deaths per generation without legend and colors
ggplot(top_10_deaths, aes(x = country, y = sum_sui)) +
  geom_bar(stat = "identity", fill = "steelblue") +  # Set fill to a neutral color
  scale_y_continuous(labels = scales::comma) +  # Add commas to y-axis labels
  labs(title = "Top 10 Countries with the highest suicide count", x = "Country", y = "Total Deaths") +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "none")  # Remove the legend

This bar graph shows that the United States and Russian Federation have the highest count of suicides from the period of 1990-2022. This could be attributed to the aftermath of the cold war to the fall of the Soviet Union, among other external factors.

Code
# Aggregate deaths by country
total_deaths <- SuicideRates %>%
  group_by(country) %>%
  summarize(sum_sui = sum(suicides))

# Find top 10 countries by total suicides
top_10_deaths <- total_deaths %>%
  arrange(desc(sum_sui)) %>%  # Order by total suicides in descending order
  tail(10)  # Select top 10 rows

# Reorder 'gen' by sum_sui in descending order
top_10_deaths$country <- reorder(top_10_deaths$country, top_10_deaths$sum_sui, decreasing = FALSE)

# Create a bar plot of total deaths per generation without legend and colors
ggplot(top_10_deaths, aes(x = country, y = sum_sui)) +
  geom_bar(stat = "identity", fill = "steelblue") +  # Set fill to a neutral color
  scale_y_continuous(labels = scales::comma) +  # Add commas to y-axis labels
  labs(title = "Top 10 Countries with the Lowest Suicide Count", x = "Country", y = "Total Suicides") +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "none")  # Remove the legend

The top two countries with the lowest suicide counts are Maldives and Iraq.
Code
library(RColorBrewer)
#bob = sample(c("0-14 years", "15-24 years", "25-34 years", "35-54 years", "55-74 years", "75+ years"), size = 1000, replace = TRUE, prob = c(0.1364572, 0.1742691, 0.1776130, 0.1803996, 0.1729401, 0.1583212))

# geom_text(aes(label = suicide_percentages),
            #position = position_stack(vjust = 0.5)) +
#suicide_percentages = c("13.6%", "17.4%", "17.7", "18%", "17.2", "15.8")

suicide_percentages <- c(0.1364572, 0.1742691, 0.1776130, 0.1803996, 0.1729401, 0.1583212)

age_labels <- c("0-14 years", "15-24 years", "25-34 years", "35-54 years", "55-74 years", "75+ years")
percents <- round(suicide_percentages/sum(suicide_percentages)*100)
age_labels <- paste(age_labels, percents, sep = ": ")
# add percents to labels
age_labels <- paste(age_labels,"%") # ad % to labels

pie(suicide_percentages,labels = age_labels, col=brewer.pal(length(age_labels), "Pastel1"),
   main="Proportion of Suicides by Age Group")

Variable Selection

Code
full <- glm(suicides ~ ., data = na.omit(rates_train),  family = poisson)
# full %>% summary()

Fitting a Poisson regression model to predict suicides using all features that are available.

Code
## stepwise selection with AIC
reduced_by_AIC <- stats::step(full)
Code
summary(reduced_by_AIC)
Code
anova(reduced_by_AIC, full, test = "Chisq")

Stepwise regression (AIC) to automatically remove unnecessary variables. stats::step() removes variables to minimize the AIC (Akaike Information Criterion). ANOVA compares the full model to the reduced model. This balances model fit and complexity. This leads to a better trade-off between explanatory power and over-fitting.

Code
glm(data = SuicideRates, suicides ~ ., family = "poisson") %>% summary()

# SuicideRates %>% filter(country == "Slovenia") %>% select(death_rate)

Tests Poisson regression for suicides and Gaussian regression for death_rate. For context, Poisson is used for count data, while Gaussian is used for continuous data like rate-based predictions.

Code
library(glmnet)
x <- model.matrix(suicides ~ ., data = na.omit(rates_train))[,-1] 
# remove intercept column
y <- na.omit(rates_train)$suicides

# alpha 1 for ridge
# alpha 0 for lasso
# alpha 0.5 for elastic net.
fit_lasso <- glmnet(x, y, alpha = 1, family = "poisson")
Code
plot(fit_lasso, main = "Feature Shrinkage via Lasso Regression")

Numeric matrix x is created for all predictors. The intercept column is removed and the target variable of suicide counts is represented by y. Lasso (alpha = 1) applies L1 regularization which shrinks coefficients and removes predictors that are weak. Lasso helps in feature selection and it does this by setting coefficients to zero.

Code
## Cross-Validation to find the best Lambda for regularization
set.seed(123)

fit.glmnet.5foldCV_lasso <- cv.glmnet(x, y, alpha = 1, nfolds = 5) # reduce to 0

fit.glmnet.5foldCV_lasso # tuned lambda = 0.06

# coef(fit.glmnet.5foldCV_lasso, s= "lambda.min") 
# our variables at the tuned lambda

5-fold cross-validation is done. This helps finds optimal penalty parameter (lambda), and extracts the best coefficients after regularization.

Code
# we were curious about elastic net 
fit_net <- glmnet(x, y, alpha = 0.5, family = "poisson")
plot(fit_net, main = "Feature Shrinkage via Elastic Net")

Elastic Net is a combination of both Lasso (L1) and Ridge (L2) regularization. This provides a balance between removing weak features (L1) and reducing multi-collinearity (L2).

Code
# Why not try ridge as well
fit_ridge <- glmnet(x, y, alpha = 0, family = "poisson")
plot(fit_ridge, main = "Feature Shrinkage via Ridge Net")

Ridge applies L2 regularization, but unlike Ridge does not remove variables completely but reduces the importance of these variables.

Final Model Selection

Code
glm(suicides ~ . - region - gen, data =SuicideRates) %>% summary()

Call:
glm(formula = suicides ~ . - region - gen, data = SuicideRates)

Coefficients:
                                                              Estimate
(Intercept)                                                  7.198e+03
countryArmenia                                               4.426e+01
countryAustralia                                            -2.091e+02
countryAustria                                              -2.042e+02
countryAzerbaijan                                            1.310e+01
countryBahamas                                              -1.799e+02
countryBahrain                                              -4.856e+01
countryBarbados                                             -2.771e+02
countryBelarus                                              -1.225e+02
countryBelgium                                              -2.063e+02
countryBelize                                               -3.382e+02
countryBrazil                                               -2.104e+03
countryBrunei Darussalam                                    -2.850e+02
countryBulgaria                                             -1.221e+02
countryCabo Verde                                           -2.448e+02
countryCanada                                               -2.600e+02
countryChile                                                -1.329e+02
countryChina, Hong Kong SAR                                 -1.668e+02
countryColombia                                             -3.913e+02
countryCosta Rica                                           -9.209e+00
countryCroatia                                              -1.716e+02
countryCyprus                                               -9.487e+01
countryCzechia                                              -1.146e+02
countryDenmark                                              -1.648e+02
countryDominican Republic                                   -3.351e+01
countryEcuador                                              -5.476e+01
countryEgypt                                                -1.269e+03
countryEl Salvador                                          -1.792e+01
countryEstonia                                              -1.485e+02
countryFiji                                                 -4.996e+01
countryFinland                                              -2.025e+02
countryFrance                                               -3.020e+02
countryGeorgia                                               4.097e+01
countryGermany                                              -4.986e+02
countryGreece                                               -1.220e+02
countryGuatemala                                            -5.185e+01
countryGuyana                                               -2.585e+02
countryHungary                                              -1.833e+02
countryIceland                                              -2.934e+02
countryIran (Islamic Republic of)                           -9.738e+02
countryIraq                                                 -4.604e+02
countryIreland                                              -1.416e+02
countryIsrael                                               -8.763e+01
countryItaly                                                -6.183e+02
countryJamaica                                               2.874e+01
countryJapan                                                -1.771e+02
countryJordan                                               -1.251e+02
countryKazakhstan                                           -6.671e+01
countryKuwait                                               -5.786e+01
countryKyrgyzstan                                            3.358e+01
countryLatvia                                               -1.957e+02
countryLebanon                                               1.946e+01
countryLithuania                                            -2.393e+02
countryLuxembourg                                           -3.753e+02
countryMalaysia                                             -3.173e+02
countryMaldives                                             -7.736e+01
countryMalta                                                -2.230e+02
countryMauritius                                            -8.677e+01
countryMexico                                               -1.196e+03
countryMongolia                                             -4.426e+01
countryMontenegro                                           -2.197e+02
countryNetherlands                                          -1.847e+02
countryNew Zealand                                          -9.520e+01
countryNicaragua                                             2.743e+01
countryNorth Macedonia                                      -7.599e+01
countryNorway                                               -1.823e+02
countryOman                                                 -3.946e+01
countryPanama                                                2.827e+01
countryParaguay                                              5.332e+01
countryPeru                                                 -2.278e+02
countryPhilippines                                          -1.095e+03
countryPoland                                               -1.795e+02
countryPortugal                                             -1.050e+02
countryQatar                                                -1.312e+02
countryRepublic of Korea                                    -1.217e+02
countryRepublic of Moldova                                  -2.433e+01
countryRomania                                              -1.120e+02
countryRussian Federation                                    1.006e+03
countrySaint Lucia                                          -3.925e+02
countrySaint Vincent and the Grenadines                     -5.896e+02
countrySerbia                                               -1.364e+02
countrySingapore                                            -1.615e+02
countrySlovakia                                             -6.824e+01
countrySlovenia                                             -2.428e+02
countrySouth Africa                                         -6.488e+02
countrySpain                                                -4.394e+02
countrySri Lanka                                            -1.389e+02
countrySuriname                                             -2.837e+02
countrySweden                                               -1.669e+02
countrySwitzerland                                          -2.286e+02
countryT?rkiye                                              -9.602e+02
countryTajikistan                                           -5.686e+01
countryThailand                                             -5.360e+02
countryTrinidad and Tobago                                  -1.051e+02
countryUkraine                                               3.107e+01
countryUnited Arab Emirates                                 -9.096e+01
countryUnited Kingdom of Great Britain and Northern Ireland -6.014e+02
countryUnited States of America                             -1.708e+03
countryUruguay                                              -1.215e+02
countryUzbekistan                                           -2.153e+02
countryVenezuela (Bolivarian Republic of)                   -2.611e+02
year                                                        -3.699e+00
sexMale                                                      1.415e+02
age15-24 years                                               1.344e+02
age25-34 years                                               1.861e+02
age35-54 years                                               4.903e+02
age55-74 years                                               3.012e+02
age75+ years                                                -7.621e+01
pop                                                          1.574e-05
gdp_pc                                                       2.460e-03
gni_pc                                                       2.180e-04
inflation                                                    3.656e-02
emp_ratio                                                   -3.590e+00
cause_death_pct                                              4.677e+00
death_rate                                                   9.903e+00
                                                            Std. Error t value
(Intercept)                                                  2.203e+03   3.267
countryArmenia                                               6.779e+01   0.653
countryAustralia                                             7.145e+01  -2.926
countryAustria                                               6.864e+01  -2.974
countryAzerbaijan                                            8.799e+01   0.149
countryBahamas                                               1.017e+02  -1.769
countryBahrain                                               1.107e+02  -0.439
countryBarbados                                              1.104e+02  -2.509
countryBelarus                                               6.977e+01  -1.756
countryBelgium                                               6.843e+01  -3.015
countryBelize                                                8.344e+01  -4.053
countryBrazil                                                2.041e+02 -10.308
countryBrunei Darussalam                                     1.230e+02  -2.317
countryBulgaria                                              6.388e+01  -1.912
countryCabo Verde                                            2.323e+02  -1.054
countryCanada                                                7.609e+01  -3.417
countryChile                                                 6.385e+01  -2.082
countryChina, Hong Kong SAR                                  7.497e+01  -2.225
countryColombia                                              7.675e+01  -5.099
countryCosta Rica                                            6.443e+01  -0.143
countryCroatia                                               6.655e+01  -2.578
countryCyprus                                                8.769e+01  -1.082
countryCzechia                                               6.571e+01  -1.745
countryDenmark                                               7.067e+01  -2.331
countryDominican Republic                                    6.508e+01  -0.515
countryEcuador                                               6.781e+01  -0.808
countryEgypt                                                 1.107e+02 -11.462
countryEl Salvador                                           6.562e+01  -0.273
countryEstonia                                               7.104e+01  -2.090
countryFiji                                                  8.781e+01  -0.569
countryFinland                                               6.783e+01  -2.986
countryFrance                                                9.170e+01  -3.294
countryGeorgia                                               6.806e+01   0.602
countryGermany                                               1.081e+02  -4.612
countryGreece                                                8.560e+01  -1.425
countryGuatemala                                             6.647e+01  -0.780
countryGuyana                                                6.776e+01  -3.815
countryHungary                                               6.580e+01  -2.786
countryIceland                                               8.279e+01  -3.543
countryIran (Islamic Republic of)                            1.290e+02  -7.547
countryIraq                                                  2.637e+02  -1.746
countryIreland                                               7.000e+01  -2.023
countryIsrael                                                6.903e+01  -1.269
countryItaly                                                 8.855e+01  -6.982
countryJamaica                                               1.127e+02   0.255
countryJapan                                                 1.529e+02  -1.158
countryJordan                                                1.262e+02  -0.991
countryKazakhstan                                            7.095e+01  -0.940
countryKuwait                                                1.020e+02  -0.567
countryKyrgyzstan                                            7.377e+01   0.455
countryLatvia                                                6.638e+01  -2.949
countryLebanon                                               1.340e+02   0.145
countryLithuania                                             6.599e+01  -3.627
countryLuxembourg                                            8.404e+01  -4.465
countryMalaysia                                              8.026e+01  -3.953
countryMaldives                                              1.428e+02  -0.542
countryMalta                                                 7.136e+01  -3.125
countryMauritius                                             6.399e+01  -1.356
countryMexico                                                1.268e+02  -9.426
countryMongolia                                              9.927e+01  -0.446
countryMontenegro                                            1.110e+02  -1.979
countryNetherlands                                           7.105e+01  -2.600
countryNew Zealand                                           6.987e+01  -1.362
countryNicaragua                                             7.477e+01   0.367
countryNorth Macedonia                                       7.047e+01  -1.078
countryNorway                                                7.699e+01  -2.368
countryOman                                                  1.383e+02  -0.285
countryPanama                                                6.843e+01   0.413
countryParaguay                                              7.259e+01   0.734
countryPeru                                                  7.960e+01  -2.861
countryPhilippines                                           1.141e+02  -9.597
countryPoland                                                7.357e+01  -2.440
countryPortugal                                              6.795e+01  -1.546
countryQatar                                                 1.420e+02  -0.924
countryRepublic of Korea                                     8.324e+01  -1.462
countryRepublic of Moldova                                   6.896e+01  -0.353
countryRomania                                               6.744e+01  -1.661
countryRussian Federation                                    1.637e+02   6.145
countrySaint Lucia                                           8.554e+01  -4.589
countrySaint Vincent and the Grenadines                      9.953e+01  -5.923
countrySerbia                                                6.617e+01  -2.061
countrySingapore                                             8.308e+01  -1.944
countrySlovakia                                              6.822e+01  -1.000
countrySlovenia                                              6.882e+01  -3.529
countrySouth Africa                                          8.186e+01  -7.926
countrySpain                                                 7.846e+01  -5.600
countrySri Lanka                                             6.963e+01  -1.995
countrySuriname                                              7.028e+01  -4.036
countrySweden                                                7.111e+01  -2.347
countrySwitzerland                                           8.132e+01  -2.811
countryT?rkiye                                               1.087e+02  -8.830
countryTajikistan                                            9.971e+01  -0.570
countryThailand                                              1.024e+02  -5.232
countryTrinidad and Tobago                                   6.873e+01  -1.529
countryUkraine                                               7.973e+01   0.390
countryUnited Arab Emirates                                  2.707e+02  -0.336
countryUnited Kingdom of Great Britain and Northern Ireland  9.234e+01  -6.513
countryUnited States of America                              3.207e+02  -5.325
countryUruguay                                               6.496e+01  -1.871
countryUzbekistan                                            8.823e+01  -2.440
countryVenezuela (Bolivarian Republic of)                    1.281e+02  -2.038
year                                                         1.105e+00  -3.346
sexMale                                                      1.026e+01  13.794
age15-24 years                                               1.724e+01   7.795
age25-34 years                                               1.689e+01  11.016
age35-54 years                                               1.674e+01  29.285
age55-74 years                                               1.778e+01  16.939
age75+ years                                                 1.998e+01  -3.815
pop                                                          1.055e-06  14.916
gdp_pc                                                       9.747e-04   2.524
gni_pc                                                       1.334e-03   0.163
inflation                                                    2.704e-02   1.352
emp_ratio                                                    1.782e+00  -2.015
cause_death_pct                                              7.953e-01   5.880
death_rate                                                   3.066e-01  32.299
                                                            Pr(>|t|)    
(Intercept)                                                 0.001090 ** 
countryArmenia                                              0.513856    
countryAustralia                                            0.003434 ** 
countryAustria                                              0.002939 ** 
countryAzerbaijan                                           0.881634    
countryBahamas                                              0.076955 .  
countryBahrain                                              0.660830    
countryBarbados                                             0.012104 *  
countryBelarus                                              0.079093 .  
countryBelgium                                              0.002572 ** 
countryBelize                                               5.07e-05 ***
countryBrazil                                                < 2e-16 ***
countryBrunei Darussalam                                    0.020514 *  
countryBulgaria                                             0.055942 .  
countryCabo Verde                                           0.291961    
countryCanada                                               0.000634 ***
countryChile                                                0.037371 *  
countryChina, Hong Kong SAR                                 0.026097 *  
countryColombia                                             3.44e-07 ***
countryCosta Rica                                           0.886356    
countryCroatia                                              0.009951 ** 
countryCyprus                                               0.279335    
countryCzechia                                              0.081033 .  
countryDenmark                                              0.019744 *  
countryDominican Republic                                   0.606653    
countryEcuador                                              0.419329    
countryEgypt                                                 < 2e-16 ***
countryEl Salvador                                          0.784744    
countryEstonia                                              0.036628 *  
countryFiji                                                 0.569379    
countryFinland                                              0.002833 ** 
countryFrance                                               0.000989 ***
countryGeorgia                                              0.547187    
countryGermany                                              4.01e-06 ***
countryGreece                                               0.154213    
countryGuatemala                                            0.435381    
countryGuyana                                               0.000137 ***
countryHungary                                              0.005339 ** 
countryIceland                                              0.000396 ***
countryIran (Islamic Republic of)                           4.63e-14 ***
countryIraq                                                 0.080866 .  
countryIreland                                              0.043063 *  
countryIsrael                                               0.204340    
countryItaly                                                2.98e-12 ***
countryJamaica                                              0.798625    
countryJapan                                                0.246764    
countryJordan                                               0.321588    
countryKazakhstan                                           0.347053    
countryKuwait                                               0.570658    
countryKyrgyzstan                                           0.649034    
countryLatvia                                               0.003193 ** 
countryLebanon                                              0.884517    
countryLithuania                                            0.000288 ***
countryLuxembourg                                           8.03e-06 ***
countryMalaysia                                             7.74e-05 ***
countryMaldives                                             0.587903    
countryMalta                                                0.001779 ** 
countryMauritius                                            0.175137    
countryMexico                                                < 2e-16 ***
countryMongolia                                             0.655725    
countryMontenegro                                           0.047786 *  
countryNetherlands                                          0.009333 ** 
countryNew Zealand                                          0.173087    
countryNicaragua                                            0.713723    
countryNorth Macedonia                                      0.280906    
countryNorway                                               0.017894 *  
countryOman                                                 0.775359    
countryPanama                                               0.679478    
countryParaguay                                             0.462678    
countryPeru                                                 0.004223 ** 
countryPhilippines                                           < 2e-16 ***
countryPoland                                               0.014679 *  
countryPortugal                                             0.122158    
countryQatar                                                0.355485    
countryRepublic of Korea                                    0.143823    
countryRepublic of Moldova                                  0.724264    
countryRomania                                              0.096791 .  
countryRussian Federation                                   8.13e-10 ***
countrySaint Lucia                                          4.48e-06 ***
countrySaint Vincent and the Grenadines                     3.20e-09 ***
countrySerbia                                               0.039281 *  
countrySingapore                                            0.051948 .  
countrySlovakia                                             0.317156    
countrySlovenia                                             0.000419 ***
countrySouth Africa                                         2.37e-15 ***
countrySpain                                                2.16e-08 ***
countrySri Lanka                                            0.046009 *  
countrySuriname                                             5.46e-05 ***
countrySweden                                               0.018934 *  
countrySwitzerland                                          0.004944 ** 
countryT?rkiye                                               < 2e-16 ***
countryTajikistan                                           0.568500    
countryThailand                                             1.69e-07 ***
countryTrinidad and Tobago                                  0.126205    
countryUkraine                                              0.696821    
countryUnited Arab Emirates                                 0.736834    
countryUnited Kingdom of Great Britain and Northern Ireland 7.52e-11 ***
countryUnited States of America                             1.02e-07 ***
countryUruguay                                              0.061401 .  
countryUzbekistan                                           0.014676 *  
countryVenezuela (Bolivarian Republic of)                   0.041521 *  
year                                                        0.000822 ***
sexMale                                                      < 2e-16 ***
age15-24 years                                              6.73e-15 ***
age25-34 years                                               < 2e-16 ***
age35-54 years                                               < 2e-16 ***
age55-74 years                                               < 2e-16 ***
age75+ years                                                0.000137 ***
pop                                                          < 2e-16 ***
gdp_pc                                                      0.011603 *  
gni_pc                                                      0.870211    
inflation                                                   0.176304    
emp_ratio                                                   0.043899 *  
cause_death_pct                                             4.15e-09 ***
death_rate                                                   < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 460104)

    Null deviance: 2.0816e+10  on 23325  degrees of freedom
Residual deviance: 1.0679e+10  on 23211  degrees of freedom
AIC: 370466

Number of Fisher Scoring iterations: 2
Code
anova(lm(gdp_pc ~ country, data = SuicideRates))
Analysis of Variance Table

Response: gdp_pc
             Df     Sum Sq    Mean Sq F value    Pr(>F)    
country     100 7.4685e+12 7.4685e+10  881.04 < 2.2e-16 ***
Residuals 23225 1.9688e+12 8.4770e+07                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
anova(lm(gni_pc ~ country, data = SuicideRates))
Analysis of Variance Table

Response: gni_pc
             Df     Sum Sq    Mean Sq F value    Pr(>F)    
country     100 5.7157e+12 5.7157e+10  694.96 < 2.2e-16 ***
Residuals 23225 1.9101e+12 8.2245e+07                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
anova(lm(gni_pc ~ gdp_pc, data = SuicideRates))
Analysis of Variance Table

Response: gni_pc
             Df     Sum Sq    Mean Sq F value    Pr(>F)    
gdp_pc        1 5.9823e+12 5.9823e+12   84896 < 2.2e-16 ***
Residuals 23324 1.6436e+12 7.0466e+07                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
library(tidyverse)
SuicideRates %>% select(country, gdp_pc, gni_pc, year) 
# A tibble: 23,326 × 4
   country    gdp_pc gni_pc  year
   <fct>       <dbl>  <dbl> <dbl>
 1 Cabo Verde  3880.   5970  2011
 2 Cabo Verde  3880.   5970  2011
 3 Cabo Verde  3880.   5970  2011
 4 Cabo Verde  3880.   5970  2011
 5 Cabo Verde  3880.   5970  2011
 6 Cabo Verde  3880.   5970  2011
 7 Cabo Verde  3880.   5970  2011
 8 Cabo Verde  3880.   5970  2011
 9 Cabo Verde  3880.   5970  2011
10 Egypt       1399.   5610  2000
# ℹ 23,316 more rows

Removes region and gen from the model - insignificant. Simplifies the final model. ANOVA checks whther the gdp_pc and gni_pc are correlated. The significant F-test shows that these two variables are redundant.

Code
pois_model <- glm(data = SuicideRates, suicides ~ . -region - country - gni_pc - gen, family = poisson(link = "log")) 
# pois_model %>% summary()

# Calculate dispersion statistic
dispersion_ratio <- sum(residuals(pois_model, type = "pearson")^2) / pois_model$df.residual

print(dispersion_ratio) ## Very dispersed data
[1] 272.2121
Code
# Predict suicides for test data
predictions <- predict(pois_model, newdata = rates_test, type = "response")

# View predictions
library(Metrics)
# library(yardsticsk)

# Actual values from test set
actual <- rates_test$suicides

# Compute evaluation metrics
mae_value <- Metrics::mae(actual, predictions)    # Mean Absolute Error
rmse_value <- Metrics::rmse(actual, predictions)  # Root Mean Squared Error
r2_value <- 1 - (sum((actual - predictions)^2) / sum((actual - mean(actual))^2))  # R-squared

Poisson Model excludes region, country, gni_pc, and gen, uses the log link function. The dispersion statistic if variance > mean then the data is over-dispersed. Since our dispersion ratio is 216.7849, this means that the variance of our data is much higher than the mean. This makes sense given our high RMSE. Which is more sensitive to outlier values.

Code
library(ggplot2)

# Create a scatter plot of actual vs. predicted values
ggplot(data.frame(actual, predictions), aes(x = actual, y = predictions)) +
  geom_point(alpha = 0.5, color = "blue") +
  geom_abline(slope = 1, intercept = 0, color = "red", linetype = "dashed") +  # Perfect fit line
  labs(title = "Actual vs Predicted Suicides",
       x = "Actual Suicides",
       y = "Predicted Suicides") +
  theme_minimal()

Time Series Modeling: For time-sake I chose to focus solely on the United States

Code
library(dplyr)
# Aggregate suicide data by year and country
SuicideRatesTimeSeries <- SuicideRates %>%
  group_by(year, country) %>%
  summarise(
    total_suicides = sum(suicides, na.rm = TRUE),  # Sum suicides per country per year
    total_population = sum(pop, na.rm = TRUE),  # Total population per country per year
    suicide_rate_per_100k = round((total_suicides / total_population) * 1000000, 1),  # Rate per 1M
    .groups = "drop"
  )

# Convert data into time series for each country
CountryTimeSeries <- split(SuicideRatesTimeSeries, SuicideRatesTimeSeries$country) %>%
  lapply(function(df) ts(df$suicide_rate_per_100k, start = 1990, frequency = 1))

Groups data by year and country, to get the total suicides and population per country per year. This calculates suicide_rate_per_100k (suicide rates per 100,000 people). This is done because this is how the data of suicides are normally shown, for example United States 2022, 14.2 suicides per 100,000. Then we convert the time series data into the time series format (ts()), creating a list of time series objects for each country.

Running Time Series for USA

Code
library(astsa)
Code
# Country Time Series, using USA
USATimeSeries <- CountryTimeSeries[["United States of America"]]

Isolate the USA time series from the list of country-specific time series.

Code
ts.plot(CountryTimeSeries[["United States of America"]], 
        main = "USA Suicide Rate per 100K (1990-2022)", 
        ylab = "Suicide Rate per 100K", 
        xlab = "Year")

Looking at the Time Series, we can break the data down into subsections. From 1990-2000, the suicide rate had steadily declined, this can be explained by the economic growth that was seen in the United States. More specifically the dot-com boom which was fueled by the development of internet-based companies, the rise of the tech-based start ups, and the massive stock market growth. Around 2000-2007, the suicide rate had remained low this could be explained by the continued economic prosperity that was seen until 2007. From 2008 - 2018, there was another huge spike in the data leading to an increase in the suicide rates, this could be attributed to the 2008 financial crisis, where unemployment had surged, and many Americans faced a lot of financial stress, the Opioid epidemic and the rise of social media and cyber-bullying may also be factors. Slight dip from 2018-2020, recovery from poor economy. 2020 onward, COVID-19 epidemic, more financial stress, with the post pandemic society.

Forecasting USA future Suicide Rates (ARIMA)

Code
library(tseries)

# Check stationarity for the USA suicide rate
adf.test(CountryTimeSeries[["United States of America"]])

    Augmented Dickey-Fuller Test

data:  CountryTimeSeries[["United States of America"]]
Dickey-Fuller = -2.255, Lag order = 3, p-value = 0.4747
alternative hypothesis: stationary

Check stationarity, since not stationary I choose to difference to remove trends and make the data stationary.

Code
library(forecast)
UnitedStatesSeries <- diff(CountryTimeSeries[["United States of America"]])
# Plot Autocorrelation (ACF) and Partial Autocorrelation (PACF)
acf(CountryTimeSeries[["United States of America"]])

Code
# PACF
pacf(CountryTimeSeries[["United States of America"]])

ACF plot helps determine the MA terms while PACF helps determine the AR terms, combined together they guide in the ARIMA model selection. Here we see that the difference is 1, AR(1) uses one past value for prediction and MA(1) uses one lagged moving average term looking at the ACF and PACF.

Code
library(forecast)

# Fit ARIMA(1,1,1) model
arima_model <- Arima(CountryTimeSeries[["United States of America"]], order = c(1,1,1))
summary(arima_model)
Series: CountryTimeSeries[["United States of America"]] 
ARIMA(1,1,1) 

Coefficients:
         ar1      ma1
      0.8631  -0.6749
s.e.  0.1582   0.2238

sigma^2 = 0.07142:  log likelihood = -1.96
AIC=9.92   AICc=10.88   BIC=14.02

Training set error measures:
                     ME      RMSE       MAE       MPE     MAPE      MASE
Training set 0.03647585 0.2535284 0.2050503 0.3399001 1.999132 0.8744792
                   ACF1
Training set 0.01594972

The ARIMA model helps to forecast future suicide rates.

Code
#Forecast of the next 10 years of Suicide Rates in the United States
forecast_arima <- forecast(arima_model, h = 10)  # Forecast next 10 years
autoplot(forecast_arima) +
  labs(title = "ARIMA(1,1,1) Forecast: USA Suicide Rate per 100K",
       x = "Year", y = "Suicide Rate per 100K") +
  theme_minimal()

To explore future trends, I visualized a forecast using the ARIMA model. The goal was not just statistical modeling but to create a clear visual representation of where suicide rates may be heading based on past data. This forecast includes a confidence interval, helping to communicate uncertainty visually. The widening blue region in the plot shows the range of possible outcomes, making it an effective way to convey both trends and uncertainty at a glance.

The ARIMA(1,1,1) forecast suggests a moderate upward trend in suicide rates, with increasing uncertainty over time. The confidence intervals widen, indicating potential external influences like economic conditions or policy changes. While the model predicts some stability, further analysis with economic factors or alternative factors could improve accuracy. Comparing trends with other countries may also provide valuable context. But Given the low amount of time, this cannot be done.